Skip to content

[RISCV] Convert LWU to LW if possible in RISCVOptWInstrs #144703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

asb
Copy link
Contributor

@asb asb commented Jun 18, 2025

The original version of this patch handled LWU => LW and LHU => LH. This revised version limits it just to LWU, which is better motivated (due to reducing diff vs RV32 and being more compressible).


This is currently implemented as part of RISCVOptWInstrs in order to reuse hasAllNbitUsers. However a new home or further refactoring will be needed (see the end of this note).

Why prefer sign-extended loads?

  • LW is compressible while LWU is not.
  • Helps to minimise the diff vs RV32 (e.g. LWU vs LW)
  • Helps to minimise distracting diffs vs GCC. I see this come up frequently when comparing GCC code and in these cases it's a red herring.

Issues or open questions with this patch as stands:

  • Doing something at the MI level makes sense as a last resort. I wonder if for some of the cases we could be producing a sign-extended load earlier on.
  • RISCVOptWInstrs is a slightly awkward home. It's currently not run for RV32, which means the LHU changes can actually add new diffs vs RV32. Potentially just this one transformation could be done for RV32.
  • Do we want to perform additional load narrowing? With the existing code, an LD will be narrowed to LW if only the lower bits are needed. The example that made look at extending load normalisation in the first place was an LWU that immediately has a small mask applied, which could be narrowed to LB. It's not clear this has any real benefit.
  • As I've put the specific home of this change as an open question, I haven't added MIR tests for RISCVOptWInstrs. I'll add that if we decide this is the right home.

The first version of the patch that included LBU->LB conversion changed 95k instructions across a compile of llvm-test-suite (including SPEC 2017). As was pointed out, Zcb has C_LH and C_LHU but only C_LBU (and there's also no C_LB in the baseline C). Therefore, we avoid converting C_LBU to C_LB as it i less compressible. This leaves us with ~36k instructions changed across a compile of llvm-test-suite.

This is currently implemented as part of RISCVOptWInstrs in order to
reuse hasAllNbitUsers. However a new home or further refactoring will be
needed (see the end of this note).

Why prefer sign-extended loads?
* Sign-extending loads are more compressible. There is no compressed
  LWU, and LBU and LHU are only available in Zcb.
* Helps to minimise the diff vs RV32 (e.g. LWU vs LW)
* Helps to minimise distracting diffs vs GCC. I see this come up
  frequently when comparing GCC code and in these cases it's a red
  herring.

Issues or open questions with this patch as stands:
* Doing something at the MI level makes sense as a last resort. I wonder
  if for some of the cases we could be producing a sign-extended load
  earlier on.
* RISCVOptWInstrs is a slightly awkward home. It's currently not run for
  RV32, which means the LBU/LHU changes can actually add new diffs vs
  RV32. Potentially just this one transformation could be done for RV32.
* Do we want to perform additional load narrowing? With the existing
  code, an LD will be narrowed to LW if only the lower bits are needed.
  The example that made look at extending load normalisation in the
  first place was an LWU that immediately has a small mask applied, which
  could be narrowed to LB. It's not clear this has any real benefit.

This patch changes 95k instructions across a compile of llvm-test-suite
(including SPEC 2017), and all tests complete successfully afterwards.
@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-backend-risc-v

Author: Alex Bradbury (asb)

Changes

This is currently implemented as part of RISCVOptWInstrs in order to reuse hasAllNbitUsers. However a new home or further refactoring will be needed (see the end of this note).

Why prefer sign-extended loads?

  • Sign-extending loads are more compressible. There is no compressed LWU, and LBU and LHU are only available in Zcb.
  • Helps to minimise the diff vs RV32 (e.g. LWU vs LW)
  • Helps to minimise distracting diffs vs GCC. I see this come up frequently when comparing GCC code and in these cases it's a red herring.

Issues or open questions with this patch as stands:

  • Doing something at the MI level makes sense as a last resort. I wonder if for some of the cases we could be producing a sign-extended load earlier on.
  • RISCVOptWInstrs is a slightly awkward home. It's currently not run for RV32, which means the LBU/LHU changes can actually add new diffs vs RV32. Potentially just this one transformation could be done for RV32.
  • Do we want to perform additional load narrowing? With the existing code, an LD will be narrowed to LW if only the lower bits are needed. The example that made look at extending load normalisation in the first place was an LWU that immediately has a small mask applied, which could be narrowed to LB. It's not clear this has any real benefit.

This patch changes 95k instructions across a compile of llvm-test-suite (including SPEC 2017), and all tests complete successfully afterwards.


Patch is 468.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144703.diff

58 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp (+45)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll (+5-11)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll (+5-11)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll (+96-96)
  • (modified) llvm/test/CodeGen/RISCV/atomic-signext.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/atomicrmw-cond-sub-clamp.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/bf16-promote.ll (+30-15)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-convert.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/bfloat.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/double-convert-strict.ll (+6-12)
  • (modified) llvm/test/CodeGen/RISCV/double-convert.ll (+6-12)
  • (modified) llvm/test/CodeGen/RISCV/float-convert-strict.ll (+10-22)
  • (modified) llvm/test/CodeGen/RISCV/float-convert.ll (+10-22)
  • (modified) llvm/test/CodeGen/RISCV/fold-mem-offset.ll (+132-65)
  • (modified) llvm/test/CodeGen/RISCV/half-arith.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/half-convert-strict.ll (+15-27)
  • (modified) llvm/test/CodeGen/RISCV/half-convert.ll (+21-39)
  • (modified) llvm/test/CodeGen/RISCV/hoist-global-addr-base.ll (+15-7)
  • (modified) llvm/test/CodeGen/RISCV/local-stack-slot-allocation.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/mem64.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/memcmp-optsize.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/memcmp.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/memcpy-inline.ll (+94-94)
  • (modified) llvm/test/CodeGen/RISCV/memcpy.ll (+32-32)
  • (modified) llvm/test/CodeGen/RISCV/memmove.ll (+35-35)
  • (modified) llvm/test/CodeGen/RISCV/nontemporal.ll (+160-160)
  • (modified) llvm/test/CodeGen/RISCV/prefer-w-inst.mir (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbb.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbkb.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+512-512)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll (+33-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+97-97)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-vrgather.ll (+32-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-mask-load-store.ll (-23)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+87-87)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll (+17-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-vpload.ll (+2-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwaddu.ll (+31-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulsu.ll (+31-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulu.ll (-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsubu.ll (+35-17)
  • (modified) llvm/test/CodeGen/RISCV/rvv/memcpy-inline.ll (+13-13)
  • (modified) llvm/test/CodeGen/RISCV/rvv/stores-of-loads-merging.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/strided-vpload.ll (+15-6)
  • (modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/stack-clash-prologue.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/unaligned-load-store.ll (+13-13)
  • (modified) llvm/test/CodeGen/RISCV/urem-seteq-illegal-types.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/urem-vector-lkk.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-by-byte-multiple-legalization.ll (+132-132)
  • (modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-legalization.ll (+72-72)
  • (modified) llvm/test/CodeGen/RISCV/zdinx-boundary-check.ll (+3-3)
diff --git a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
index ed61236415ccf..a141bae55b70f 100644
--- a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
+++ b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
@@ -71,6 +71,8 @@ class RISCVOptWInstrs : public MachineFunctionPass {
                       const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
   bool appendWSuffixes(MachineFunction &MF, const RISCVInstrInfo &TII,
                        const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
+  bool convertZExtLoads(MachineFunction &MF, const RISCVInstrInfo &TII,
+                       const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.setPreservesCFG();
@@ -788,6 +790,47 @@ bool RISCVOptWInstrs::appendWSuffixes(MachineFunction &MF,
   return MadeChange;
 }
 
+bool RISCVOptWInstrs::convertZExtLoads(MachineFunction &MF,
+                                      const RISCVInstrInfo &TII,
+                                      const RISCVSubtarget &ST,
+                                      MachineRegisterInfo &MRI) {
+  bool MadeChange = false;
+  for (MachineBasicBlock &MBB : MF) {
+    for (MachineInstr &MI : MBB) {
+      unsigned WOpc;
+      int UsersWidth;
+      switch (MI.getOpcode()) {
+      default:
+        continue;
+      case RISCV::LBU:
+        WOpc = RISCV::LB;
+        UsersWidth = 8;
+        break;
+      case RISCV::LHU:
+        WOpc = RISCV::LH;
+        UsersWidth = 16;
+        break;
+      case RISCV::LWU:
+        WOpc = RISCV::LW;
+        UsersWidth = 32;
+        break;
+      }
+
+      if (hasAllNBitUsers(MI, ST, MRI, UsersWidth)) {
+        LLVM_DEBUG(dbgs() << "Replacing " << MI);
+        MI.setDesc(TII.get(WOpc));
+        MI.clearFlag(MachineInstr::MIFlag::NoSWrap);
+        MI.clearFlag(MachineInstr::MIFlag::NoUWrap);
+        MI.clearFlag(MachineInstr::MIFlag::IsExact);
+        LLVM_DEBUG(dbgs() << "     with " << MI);
+        MadeChange = true;
+      }
+    }
+  }
+
+  return MadeChange;
+}
+
 bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
   if (skipFunction(MF.getFunction()))
     return false;
@@ -808,5 +851,7 @@ bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
   if (ST.preferWInst())
     MadeChange |= appendWSuffixes(MF, TII, ST, MRI);
 
+  MadeChange |= convertZExtLoads(MF, TII, ST, MRI);
+
   return MadeChange;
 }
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
index a49e94f4bc910..620c5ecc6c1e7 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
@@ -246,17 +246,11 @@ define double @fcvt_d_wu(i32 %a) nounwind {
 }
 
 define double @fcvt_d_wu_load(ptr %p) nounwind {
-; RV32IFD-LABEL: fcvt_d_wu_load:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    lw a0, 0(a0)
-; RV32IFD-NEXT:    fcvt.d.wu fa0, a0
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: fcvt_d_wu_load:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    lwu a0, 0(a0)
-; RV64IFD-NEXT:    fcvt.d.wu fa0, a0
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: fcvt_d_wu_load:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    lw a0, 0(a0)
+; CHECKIFD-NEXT:    fcvt.d.wu fa0, a0
+; CHECKIFD-NEXT:    ret
 ;
 ; RV32I-LABEL: fcvt_d_wu_load:
 ; RV32I:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
index fa093623dd6f8..bbea7929a304e 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
@@ -232,17 +232,11 @@ define float @fcvt_s_wu(i32 %a) nounwind {
 }
 
 define float @fcvt_s_wu_load(ptr %p) nounwind {
-; RV32IF-LABEL: fcvt_s_wu_load:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    lw a0, 0(a0)
-; RV32IF-NEXT:    fcvt.s.wu fa0, a0
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: fcvt_s_wu_load:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    lwu a0, 0(a0)
-; RV64IF-NEXT:    fcvt.s.wu fa0, a0
-; RV64IF-NEXT:    ret
+; CHECKIF-LABEL: fcvt_s_wu_load:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    lw a0, 0(a0)
+; CHECKIF-NEXT:    fcvt.s.wu fa0, a0
+; CHECKIF-NEXT:    ret
 ;
 ; RV32I-LABEL: fcvt_s_wu_load:
 ; RV32I:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
index 9690302552090..65838f51fc920 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
@@ -748,7 +748,7 @@ define signext i32 @ctpop_i32_load(ptr %p) nounwind {
 ;
 ; RV64ZBB-LABEL: ctpop_i32_load:
 ; RV64ZBB:       # %bb.0:
-; RV64ZBB-NEXT:    lwu a0, 0(a0)
+; RV64ZBB-NEXT:    lw a0, 0(a0)
 ; RV64ZBB-NEXT:    cpopw a0, a0
 ; RV64ZBB-NEXT:    ret
   %a = load i32, ptr %p
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
index cd59c9e01806d..ba058ca0b500a 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
@@ -114,7 +114,7 @@ define i64 @pack_i64_2(i32 signext %a, i32 signext %b) nounwind {
 define i64 @pack_i64_3(ptr %0, ptr %1) {
 ; RV64I-LABEL: pack_i64_3:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    lwu a0, 0(a0)
+; RV64I-NEXT:    lw a0, 0(a0)
 ; RV64I-NEXT:    lwu a1, 0(a1)
 ; RV64I-NEXT:    slli a0, a0, 32
 ; RV64I-NEXT:    or a0, a0, a1
@@ -122,8 +122,8 @@ define i64 @pack_i64_3(ptr %0, ptr %1) {
 ;
 ; RV64ZBKB-LABEL: pack_i64_3:
 ; RV64ZBKB:       # %bb.0:
-; RV64ZBKB-NEXT:    lwu a0, 0(a0)
-; RV64ZBKB-NEXT:    lwu a1, 0(a1)
+; RV64ZBKB-NEXT:    lw a0, 0(a0)
+; RV64ZBKB-NEXT:    lw a1, 0(a1)
 ; RV64ZBKB-NEXT:    pack a0, a1, a0
 ; RV64ZBKB-NEXT:    ret
   %3 = load i32, ptr %0, align 4
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
index 69519c00f88ea..27c6d0240f987 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
@@ -8,13 +8,13 @@ define void @lshr_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -85,13 +85,13 @@ define void @shl_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -162,13 +162,13 @@ define void @ashr_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -244,25 +244,25 @@ define void @lshr_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -395,25 +395,25 @@ define void @shl_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -541,25 +541,25 @@ define void @ashr_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -695,7 +695,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -707,7 +707,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -729,7 +729,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1028,7 +1028,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1040,7 +1040,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1062,7 +1062,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1361,7 +1361,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1373,7 +1373,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1395,7 +1395,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1690,7 +1690,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1702,7 +1702,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1724,7 +1724,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2020,7 +2020,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2032,7 +2032,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2054,7 +2054,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2353,7 +2353,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2365,7 +2365,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2387,7 +2387,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2697,7 +2697,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2705,7 +2705,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s0, 12(a0)
 ; RV64I-NEXT:    lbu s1, 13(a0)
 ; RV64I-NEXT:    lbu s2, 14(a0)
-; RV64I-NEXT:    lbu s3, 15(a0)
+; RV64I-NEXT:    lb s3, 15(a0)
 ; RV64I-NEXT:    lbu s4, 16(a0)
 ; RV64I-NEXT:    lbu s5, 17(a0)
 ; RV64I-NEXT:    lbu s6, 18(a0)
@@ -2719,7 +2719,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s8, 20(a0)
 ; RV64I-NEXT:    lbu s9, 21(a0)
 ; RV64I-NEXT:    lbu s10, 22(a0)
-; RV64I-NEXT:    lbu s11, 23(a0)
+; RV64I-NEXT:    lb s11, 23(a0)
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
 ; RV64I-NEXT:    slli t6, t6, 8
@@ -2741,7 +2741,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s2, 28(a0)
 ; RV64I-NEXT:    lbu s3, 29(a0)
 ; RV64I-NEXT:    lbu s4, 30(a0)
-; RV64I-NEXT:    lbu a0, 31(a0)
+; RV64I-NEXT:    lb a0, 31(a0)
 ; RV64I-NEXT:    slli s9, s9, 8
 ; RV64I-NEXT:    slli s11, s11, 8
 ; RV64I-NEXT:    slli t6, t6, 8
@@ -2763,7 +2763,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a0, 4(a1)
 ; RV64I-NEXT:    lbu s1, 5(a1)
 ; RV64I-NEXT:    lbu s4, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli s8, s8, 8
 ; RV64I-NEXT:    or s7, s8, s7
 ; RV64I-NEXT:    slli s1, s1, 8
@@ -3621,7 +3621,7 @@ define void @lshr_32bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -3629,7 +3629,7 @@ define void @lshr_32bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu s0, 12(a0)
 ; RV64I-NEXT:    lbu s1, 13(a0)
 ; RV64I-NEXT:    lbu s2, 14(a0)
-; RV64I-NEXT:    lbu s3, 15(a0)
+; RV64I-NEXT:    lb s3, 15(a0)
 ; RV64I-NEXT:    lbu s4, 16(a0)
 ; RV64I-NEXT:    ...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jun 18, 2025

@llvm/pr-subscribers-llvm-globalisel

Author: Alex Bradbury (asb)

Changes

This is currently implemented as part of RISCVOptWInstrs in order to reuse hasAllNbitUsers. However a new home or further refactoring will be needed (see the end of this note).

Why prefer sign-extended loads?

  • Sign-extending loads are more compressible. There is no compressed LWU, and LBU and LHU are only available in Zcb.
  • Helps to minimise the diff vs RV32 (e.g. LWU vs LW)
  • Helps to minimise distracting diffs vs GCC. I see this come up frequently when comparing GCC code and in these cases it's a red herring.

Issues or open questions with this patch as stands:

  • Doing something at the MI level makes sense as a last resort. I wonder if for some of the cases we could be producing a sign-extended load earlier on.
  • RISCVOptWInstrs is a slightly awkward home. It's currently not run for RV32, which means the LBU/LHU changes can actually add new diffs vs RV32. Potentially just this one transformation could be done for RV32.
  • Do we want to perform additional load narrowing? With the existing code, an LD will be narrowed to LW if only the lower bits are needed. The example that made look at extending load normalisation in the first place was an LWU that immediately has a small mask applied, which could be narrowed to LB. It's not clear this has any real benefit.

This patch changes 95k instructions across a compile of llvm-test-suite (including SPEC 2017), and all tests complete successfully afterwards.


Patch is 468.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/144703.diff

58 Files Affected:

  • (modified) llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp (+45)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll (+5-11)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll (+5-11)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll (+96-96)
  • (modified) llvm/test/CodeGen/RISCV/atomic-signext.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/atomicrmw-cond-sub-clamp.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/bf16-promote.ll (+30-15)
  • (modified) llvm/test/CodeGen/RISCV/bfloat-convert.ll (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/bfloat.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/ctz_zero_return_test.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/double-convert-strict.ll (+6-12)
  • (modified) llvm/test/CodeGen/RISCV/double-convert.ll (+6-12)
  • (modified) llvm/test/CodeGen/RISCV/float-convert-strict.ll (+10-22)
  • (modified) llvm/test/CodeGen/RISCV/float-convert.ll (+10-22)
  • (modified) llvm/test/CodeGen/RISCV/fold-mem-offset.ll (+132-65)
  • (modified) llvm/test/CodeGen/RISCV/half-arith.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/half-convert-strict.ll (+15-27)
  • (modified) llvm/test/CodeGen/RISCV/half-convert.ll (+21-39)
  • (modified) llvm/test/CodeGen/RISCV/hoist-global-addr-base.ll (+15-7)
  • (modified) llvm/test/CodeGen/RISCV/local-stack-slot-allocation.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/mem64.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/memcmp-optsize.ll (+8-8)
  • (modified) llvm/test/CodeGen/RISCV/memcmp.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/memcpy-inline.ll (+94-94)
  • (modified) llvm/test/CodeGen/RISCV/memcpy.ll (+32-32)
  • (modified) llvm/test/CodeGen/RISCV/memmove.ll (+35-35)
  • (modified) llvm/test/CodeGen/RISCV/nontemporal.ll (+160-160)
  • (modified) llvm/test/CodeGen/RISCV/prefer-w-inst.mir (+2-2)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbb.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rv64zbkb.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/rvv/expandload.ll (+512-512)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vector-i8-index-cornercase.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-extract-subvector.ll (+33-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-buildvec.ll (+97-97)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int-vrgather.ll (+32-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-mask-load-store.ll (-23)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-masked-gather.ll (+87-87)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-int.ll (+17-8)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-load-store-asm.ll (+5-5)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-strided-vpload.ll (+2-7)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-unaligned.ll (+4-4)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwaddu.ll (+31-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulsu.ll (+31-15)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwmulu.ll (-14)
  • (modified) llvm/test/CodeGen/RISCV/rvv/fixed-vectors-vwsubu.ll (+35-17)
  • (modified) llvm/test/CodeGen/RISCV/rvv/memcpy-inline.ll (+13-13)
  • (modified) llvm/test/CodeGen/RISCV/rvv/stores-of-loads-merging.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/rvv/strided-vpload.ll (+15-6)
  • (modified) llvm/test/CodeGen/RISCV/srem-seteq-illegal-types.ll (+3-3)
  • (modified) llvm/test/CodeGen/RISCV/stack-clash-prologue.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/unaligned-load-store.ll (+13-13)
  • (modified) llvm/test/CodeGen/RISCV/urem-seteq-illegal-types.ll (+1-1)
  • (modified) llvm/test/CodeGen/RISCV/urem-vector-lkk.ll (+6-6)
  • (modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-by-byte-multiple-legalization.ll (+132-132)
  • (modified) llvm/test/CodeGen/RISCV/wide-scalar-shift-legalization.ll (+72-72)
  • (modified) llvm/test/CodeGen/RISCV/zdinx-boundary-check.ll (+3-3)
diff --git a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
index ed61236415ccf..a141bae55b70f 100644
--- a/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
+++ b/llvm/lib/Target/RISCV/RISCVOptWInstrs.cpp
@@ -71,6 +71,8 @@ class RISCVOptWInstrs : public MachineFunctionPass {
                       const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
   bool appendWSuffixes(MachineFunction &MF, const RISCVInstrInfo &TII,
                        const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
+  bool convertZExtLoads(MachineFunction &MF, const RISCVInstrInfo &TII,
+                       const RISCVSubtarget &ST, MachineRegisterInfo &MRI);
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
     AU.setPreservesCFG();
@@ -788,6 +790,47 @@ bool RISCVOptWInstrs::appendWSuffixes(MachineFunction &MF,
   return MadeChange;
 }
 
+bool RISCVOptWInstrs::convertZExtLoads(MachineFunction &MF,
+                                      const RISCVInstrInfo &TII,
+                                      const RISCVSubtarget &ST,
+                                      MachineRegisterInfo &MRI) {
+  bool MadeChange = false;
+  for (MachineBasicBlock &MBB : MF) {
+    for (MachineInstr &MI : MBB) {
+      unsigned WOpc;
+      int UsersWidth;
+      switch (MI.getOpcode()) {
+      default:
+        continue;
+      case RISCV::LBU:
+        WOpc = RISCV::LB;
+        UsersWidth = 8;
+        break;
+      case RISCV::LHU:
+        WOpc = RISCV::LH;
+        UsersWidth = 16;
+        break;
+      case RISCV::LWU:
+        WOpc = RISCV::LW;
+        UsersWidth = 32;
+        break;
+      }
+
+      if (hasAllNBitUsers(MI, ST, MRI, UsersWidth)) {
+        LLVM_DEBUG(dbgs() << "Replacing " << MI);
+        MI.setDesc(TII.get(WOpc));
+        MI.clearFlag(MachineInstr::MIFlag::NoSWrap);
+        MI.clearFlag(MachineInstr::MIFlag::NoUWrap);
+        MI.clearFlag(MachineInstr::MIFlag::IsExact);
+        LLVM_DEBUG(dbgs() << "     with " << MI);
+        MadeChange = true;
+      }
+    }
+  }
+
+  return MadeChange;
+}
+
 bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
   if (skipFunction(MF.getFunction()))
     return false;
@@ -808,5 +851,7 @@ bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
   if (ST.preferWInst())
     MadeChange |= appendWSuffixes(MF, TII, ST, MRI);
 
+  MadeChange |= convertZExtLoads(MF, TII, ST, MRI);
+
   return MadeChange;
 }
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
index a49e94f4bc910..620c5ecc6c1e7 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/double-convert.ll
@@ -246,17 +246,11 @@ define double @fcvt_d_wu(i32 %a) nounwind {
 }
 
 define double @fcvt_d_wu_load(ptr %p) nounwind {
-; RV32IFD-LABEL: fcvt_d_wu_load:
-; RV32IFD:       # %bb.0:
-; RV32IFD-NEXT:    lw a0, 0(a0)
-; RV32IFD-NEXT:    fcvt.d.wu fa0, a0
-; RV32IFD-NEXT:    ret
-;
-; RV64IFD-LABEL: fcvt_d_wu_load:
-; RV64IFD:       # %bb.0:
-; RV64IFD-NEXT:    lwu a0, 0(a0)
-; RV64IFD-NEXT:    fcvt.d.wu fa0, a0
-; RV64IFD-NEXT:    ret
+; CHECKIFD-LABEL: fcvt_d_wu_load:
+; CHECKIFD:       # %bb.0:
+; CHECKIFD-NEXT:    lw a0, 0(a0)
+; CHECKIFD-NEXT:    fcvt.d.wu fa0, a0
+; CHECKIFD-NEXT:    ret
 ;
 ; RV32I-LABEL: fcvt_d_wu_load:
 ; RV32I:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
index fa093623dd6f8..bbea7929a304e 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/float-convert.ll
@@ -232,17 +232,11 @@ define float @fcvt_s_wu(i32 %a) nounwind {
 }
 
 define float @fcvt_s_wu_load(ptr %p) nounwind {
-; RV32IF-LABEL: fcvt_s_wu_load:
-; RV32IF:       # %bb.0:
-; RV32IF-NEXT:    lw a0, 0(a0)
-; RV32IF-NEXT:    fcvt.s.wu fa0, a0
-; RV32IF-NEXT:    ret
-;
-; RV64IF-LABEL: fcvt_s_wu_load:
-; RV64IF:       # %bb.0:
-; RV64IF-NEXT:    lwu a0, 0(a0)
-; RV64IF-NEXT:    fcvt.s.wu fa0, a0
-; RV64IF-NEXT:    ret
+; CHECKIF-LABEL: fcvt_s_wu_load:
+; CHECKIF:       # %bb.0:
+; CHECKIF-NEXT:    lw a0, 0(a0)
+; CHECKIF-NEXT:    fcvt.s.wu fa0, a0
+; CHECKIF-NEXT:    ret
 ;
 ; RV32I-LABEL: fcvt_s_wu_load:
 ; RV32I:       # %bb.0:
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
index 9690302552090..65838f51fc920 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbb.ll
@@ -748,7 +748,7 @@ define signext i32 @ctpop_i32_load(ptr %p) nounwind {
 ;
 ; RV64ZBB-LABEL: ctpop_i32_load:
 ; RV64ZBB:       # %bb.0:
-; RV64ZBB-NEXT:    lwu a0, 0(a0)
+; RV64ZBB-NEXT:    lw a0, 0(a0)
 ; RV64ZBB-NEXT:    cpopw a0, a0
 ; RV64ZBB-NEXT:    ret
   %a = load i32, ptr %p
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
index cd59c9e01806d..ba058ca0b500a 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/rv64zbkb.ll
@@ -114,7 +114,7 @@ define i64 @pack_i64_2(i32 signext %a, i32 signext %b) nounwind {
 define i64 @pack_i64_3(ptr %0, ptr %1) {
 ; RV64I-LABEL: pack_i64_3:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    lwu a0, 0(a0)
+; RV64I-NEXT:    lw a0, 0(a0)
 ; RV64I-NEXT:    lwu a1, 0(a1)
 ; RV64I-NEXT:    slli a0, a0, 32
 ; RV64I-NEXT:    or a0, a0, a1
@@ -122,8 +122,8 @@ define i64 @pack_i64_3(ptr %0, ptr %1) {
 ;
 ; RV64ZBKB-LABEL: pack_i64_3:
 ; RV64ZBKB:       # %bb.0:
-; RV64ZBKB-NEXT:    lwu a0, 0(a0)
-; RV64ZBKB-NEXT:    lwu a1, 0(a1)
+; RV64ZBKB-NEXT:    lw a0, 0(a0)
+; RV64ZBKB-NEXT:    lw a1, 0(a1)
 ; RV64ZBKB-NEXT:    pack a0, a1, a0
 ; RV64ZBKB-NEXT:    ret
   %3 = load i32, ptr %0, align 4
diff --git a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
index 69519c00f88ea..27c6d0240f987 100644
--- a/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
+++ b/llvm/test/CodeGen/RISCV/GlobalISel/wide-scalar-shift-by-byte-multiple-legalization.ll
@@ -8,13 +8,13 @@ define void @lshr_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -85,13 +85,13 @@ define void @shl_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -162,13 +162,13 @@ define void @ashr_4bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a3, 1(a0)
 ; RV64I-NEXT:    lbu a4, 0(a0)
 ; RV64I-NEXT:    lbu a5, 2(a0)
-; RV64I-NEXT:    lbu a0, 3(a0)
+; RV64I-NEXT:    lb a0, 3(a0)
 ; RV64I-NEXT:    slli a3, a3, 8
 ; RV64I-NEXT:    or a3, a3, a4
 ; RV64I-NEXT:    lbu a4, 0(a1)
 ; RV64I-NEXT:    lbu a6, 1(a1)
 ; RV64I-NEXT:    lbu a7, 2(a1)
-; RV64I-NEXT:    lbu a1, 3(a1)
+; RV64I-NEXT:    lb a1, 3(a1)
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    or a0, a0, a5
 ; RV64I-NEXT:    slli a6, a6, 8
@@ -244,25 +244,25 @@ define void @lshr_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -395,25 +395,25 @@ define void @shl_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -541,25 +541,25 @@ define void @ashr_8bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu a0, 7(a0)
+; RV64I-NEXT:    lb a0, 7(a0)
 ; RV64I-NEXT:    slli a4, a4, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a3, a4, a3
 ; RV64I-NEXT:    or a4, a6, a5
-; RV64I-NEXT:    lbu a5, 0(a1)
-; RV64I-NEXT:    lbu a6, 1(a1)
-; RV64I-NEXT:    lbu t2, 2(a1)
-; RV64I-NEXT:    lbu t3, 3(a1)
+; RV64I-NEXT:    lb a5, 0(a1)
+; RV64I-NEXT:    lb a6, 1(a1)
+; RV64I-NEXT:    lb t2, 2(a1)
+; RV64I-NEXT:    lb t3, 3(a1)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli a0, a0, 8
 ; RV64I-NEXT:    slli a6, a6, 8
 ; RV64I-NEXT:    or a7, t0, a7
 ; RV64I-NEXT:    or a0, a0, t1
 ; RV64I-NEXT:    or a5, a6, a5
-; RV64I-NEXT:    lbu a6, 4(a1)
-; RV64I-NEXT:    lbu t0, 5(a1)
-; RV64I-NEXT:    lbu t1, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a6, 4(a1)
+; RV64I-NEXT:    lb t0, 5(a1)
+; RV64I-NEXT:    lb t1, 6(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t3, t3, 8
 ; RV64I-NEXT:    or t2, t3, t2
 ; RV64I-NEXT:    slli t0, t0, 8
@@ -695,7 +695,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -707,7 +707,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -729,7 +729,7 @@ define void @lshr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1028,7 +1028,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1040,7 +1040,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1062,7 +1062,7 @@ define void @lshr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1361,7 +1361,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1373,7 +1373,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1395,7 +1395,7 @@ define void @shl_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1690,7 +1690,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -1702,7 +1702,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -1724,7 +1724,7 @@ define void @shl_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) nounw
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2020,7 +2020,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2032,7 +2032,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2054,7 +2054,7 @@ define void @ashr_16bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2353,7 +2353,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2365,7 +2365,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a5, 12(a0)
 ; RV64I-NEXT:    lbu a6, 13(a0)
 ; RV64I-NEXT:    lbu s0, 14(a0)
-; RV64I-NEXT:    lbu a0, 15(a0)
+; RV64I-NEXT:    lb a0, 15(a0)
 ; RV64I-NEXT:    slli t0, t0, 8
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2387,7 +2387,7 @@ define void @ashr_16bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu t3, 4(a1)
 ; RV64I-NEXT:    lbu t4, 5(a1)
 ; RV64I-NEXT:    lbu s0, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli t6, t6, 8
 ; RV64I-NEXT:    or t5, t6, t5
 ; RV64I-NEXT:    slli t4, t4, 8
@@ -2697,7 +2697,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -2705,7 +2705,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s0, 12(a0)
 ; RV64I-NEXT:    lbu s1, 13(a0)
 ; RV64I-NEXT:    lbu s2, 14(a0)
-; RV64I-NEXT:    lbu s3, 15(a0)
+; RV64I-NEXT:    lb s3, 15(a0)
 ; RV64I-NEXT:    lbu s4, 16(a0)
 ; RV64I-NEXT:    lbu s5, 17(a0)
 ; RV64I-NEXT:    lbu s6, 18(a0)
@@ -2719,7 +2719,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s8, 20(a0)
 ; RV64I-NEXT:    lbu s9, 21(a0)
 ; RV64I-NEXT:    lbu s10, 22(a0)
-; RV64I-NEXT:    lbu s11, 23(a0)
+; RV64I-NEXT:    lb s11, 23(a0)
 ; RV64I-NEXT:    slli t2, t2, 8
 ; RV64I-NEXT:    slli t4, t4, 8
 ; RV64I-NEXT:    slli t6, t6, 8
@@ -2741,7 +2741,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu s2, 28(a0)
 ; RV64I-NEXT:    lbu s3, 29(a0)
 ; RV64I-NEXT:    lbu s4, 30(a0)
-; RV64I-NEXT:    lbu a0, 31(a0)
+; RV64I-NEXT:    lb a0, 31(a0)
 ; RV64I-NEXT:    slli s9, s9, 8
 ; RV64I-NEXT:    slli s11, s11, 8
 ; RV64I-NEXT:    slli t6, t6, 8
@@ -2763,7 +2763,7 @@ define void @lshr_32bytes(ptr %src.ptr, ptr %byteOff.ptr, ptr %dst) nounwind {
 ; RV64I-NEXT:    lbu a0, 4(a1)
 ; RV64I-NEXT:    lbu s1, 5(a1)
 ; RV64I-NEXT:    lbu s4, 6(a1)
-; RV64I-NEXT:    lbu a1, 7(a1)
+; RV64I-NEXT:    lb a1, 7(a1)
 ; RV64I-NEXT:    slli s8, s8, 8
 ; RV64I-NEXT:    or s7, s8, s7
 ; RV64I-NEXT:    slli s1, s1, 8
@@ -3621,7 +3621,7 @@ define void @lshr_32bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu a7, 4(a0)
 ; RV64I-NEXT:    lbu t0, 5(a0)
 ; RV64I-NEXT:    lbu t1, 6(a0)
-; RV64I-NEXT:    lbu t2, 7(a0)
+; RV64I-NEXT:    lb t2, 7(a0)
 ; RV64I-NEXT:    lbu t3, 8(a0)
 ; RV64I-NEXT:    lbu t4, 9(a0)
 ; RV64I-NEXT:    lbu t5, 10(a0)
@@ -3629,7 +3629,7 @@ define void @lshr_32bytes_wordOff(ptr %src.ptr, ptr %wordOff.ptr, ptr %dst) noun
 ; RV64I-NEXT:    lbu s0, 12(a0)
 ; RV64I-NEXT:    lbu s1, 13(a0)
 ; RV64I-NEXT:    lbu s2, 14(a0)
-; RV64I-NEXT:    lbu s3, 15(a0)
+; RV64I-NEXT:    lb s3, 15(a0)
 ; RV64I-NEXT:    lbu s4, 16(a0)
 ; RV64I-NEXT:    ...
[truncated]

Copy link

github-actions bot commented Jun 18, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@topperc
Copy link
Collaborator

topperc commented Jun 18, 2025

If I remember right, there is no c.lb in Zcb. Does this make compression worse?

@asb
Copy link
Contributor Author

asb commented Jun 18, 2025

If I remember right, there is no c.lb in Zcb. Does this make compression worse?

You're right, I'll update to exclude LB.

EDIT: Now pushed, and patch description updated.

@topperc
Copy link
Collaborator

topperc commented Jun 18, 2025

Have you look at doing this during instruction selection using the hasAllNbitUsers in RISCVISelDAGToDAG.cpp. That isn't restricted to RV64. Though it won't work across basic blocks.

@topperc
Copy link
Collaborator

topperc commented Jun 18, 2025

I think changing LWU->LW is useful for compression and minimizing RV32/RV64 delta.

I'm skeptical that LWU->LW and LHU->LH will help us match gcc better. Here are trivial examples where LLVM used LH/LW and gcc used LHU/LWU. https://godbolt.org/z/cbje4aT7P I guess it could be better on average, but it certainly doesn't guarantee a match to gcc.

@asb
Copy link
Contributor Author

asb commented Jun 19, 2025

I think changing LWU->LW is useful for compression and minimizing RV32/RV64 delta.

Agreed.

I'm skeptical that LWU->LW and LHU->LH will help us match gcc better. Here are trivial examples where LLVM used LH/LW and gcc used LHU/LWU. https://godbolt.org/z/cbje4aT7P I guess it could be better on average, but it certainly doesn't guarantee a match to gcc.

That may be true. I spotted this looking at how we had some LWU that GCC didn't in a workload and wondering if it was an indicator of us overall making some different/worse choices in terms of sign/zero extension (and of course found it was just a case where either was equivalent), and I'm sure I've seen the same before. But I totally believe there are other cases where we make different choices. I'll quantify how often it kicks in, but I probably wouldn't mind dropping LHU->LH. Perhaps the argument for this kind of change is more just "canonicalisation".

Have you look at doing this during instruction selection using the hasAllNbitUsers in RISCVISelDAGToDAG.cpp

I'll try that and report back.

@asb asb changed the title [RISCV] Switch to sign-extended loads if possible in RISCVOptWInstrs [RISCV] Convert LWU to LW if possible in RISCVOptWInstrs Jun 25, 2025
@asb
Copy link
Contributor Author

asb commented Jun 25, 2025

Have you look at doing this during instruction selection using the hasAllNbitUsers in RISCVISelDAGToDAG.cpp

I'm limiting scope to just LWU => LW for now (there perhaps isn't much of an argument for LHU => LH beyond perhaps picking a "canonical form). I implemented this at ISel with the following patch:

--- a/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
+++ b/llvm/lib/Target/RISCV/GISel/RISCVInstructionSelector.cpp
@@ -208,7 +208,8 @@ bool RISCVInstructionSelector::hasAllNBitUsers(const MachineInstr &MI,
           MI.getOpcode() == TargetOpcode::G_AND ||
           MI.getOpcode() == TargetOpcode::G_OR ||
           MI.getOpcode() == TargetOpcode::G_XOR ||
-          MI.getOpcode() == TargetOpcode::G_SEXT_INREG || Depth != 0) &&
+          MI.getOpcode() == TargetOpcode::G_SEXT_INREG ||
+          MI.getOpcode() == TargetOpcode::G_ZEXTLOAD || Depth != 0) &&
          "Unexpected opcode");

   if (Depth >= RISCVInstructionSelector::MaxRecursionDepth)
--- a/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
+++ b/llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
@@ -3532,7 +3532,8 @@ bool RISCVDAGToDAGISel::hasAllNBitUsers(SDNode *Node, unsigned Bits,
           Node->getOpcode() == ISD::SRL || Node->getOpcode() == ISD::AND ||
           Node->getOpcode() == ISD::OR || Node->getOpcode() == ISD::XOR ||
           Node->getOpcode() == ISD::SIGN_EXTEND_INREG ||
-          isa<ConstantSDNode>(Node) || Depth != 0) &&
+          Node->getOpcode() == ISD::LOAD || isa<ConstantSDNode>(Node) ||
+          Depth != 0) &&
          "Unexpected opcode");

   if (Depth >= SelectionDAG::MaxRecursionDepth)
--- a/llvm/lib/Target/RISCV/RISCVInstrInfo.td
+++ b/llvm/lib/Target/RISCV/RISCVInstrInfo.td
@@ -2083,6 +2083,13 @@ class binop_allwusers<SDPatternOperator operator>
   let GISelPredicateCode = [{ return hasAllWUsers(MI); }];
 }

+class unaryop_allwusers<SDPatternOperator operator>
+    : PatFrag<(ops node:$arg), (i64(operator node:$arg)), [{
+  return hasAllWUsers(N);
+}]> {
+  let GISelPredicateCode = [{ return hasAllWUsers(MI); }];
+}
+
 def sexti32_allwusers : PatFrag<(ops node:$src),
                                 (sext_inreg node:$src, i32), [{
   return hasAllWUsers(N);
@@ -2157,6 +2164,7 @@ def : Pat<(or_is_add 33signbits_node:$rs1, simm12:$imm),

 def : LdPat<sextloadi32, LW, i64>;
 def : LdPat<extloadi32, LW, i64>;
+def : LdPat<unaryop_allwusers<zextloadi32>, LW, i64>;
 def : LdPat<zextloadi32, LWU, i64>;
 def : LdPat<load, LD, i64>;

I've found that this is much less effective than the RISCVOptWInstrs change:

  • Doing it at ISel results in ~4600 instructions changed across the test suite
  • In RISCVOoptWInstrs (updated to do LWU=>LW only) changes ~20500 instructions
  • There is no additional benefit in doing both (as you'd expected - but I ran this just to check).

I'm about to push changes to this patch to limit it to just the LWU change. Looking at the diffs between doing it at isel and this way, there are some cases where at first glance I would have thought would have been handled - I'll pick through a couple just to check there's nothing surprising going on.

Copy link
Member

@lenary lenary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lenary
Copy link
Member

lenary commented Jul 3, 2025

To maybe add some more info to my LGTM: I accept this is probably not the perfect place for this, but it's a place that seems to work.

@@ -808,5 +831,7 @@ bool RISCVOptWInstrs::runOnMachineFunction(MachineFunction &MF) {
if (ST.preferWInst())
MadeChange |= appendWSuffixes(MF, TII, ST, MRI);

MadeChange |= convertZExtLoads(MF, TII, ST, MRI);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine stripWSuffixes/appendWSuffixes/convertZExtLoads into a single function that walks over the function once and does the right thing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants